「 GPU開発者の信条 機能的整合性とアーキテクチャの分離を、単なるスループットよりも優先する基本的な哲学を確立します。ROCmエコシステムでは、HIPが大規模な並列処理を可能にしているため、すべてのカーネルを高リスクの独立したブラックボックスとして扱います。
1. 正しさの優先順位
HIP開発において、統計的に不一致な「高速」な結果は失敗です。すべての ROCm スタック アセンブリレベルやレジスタ負荷最適化に取り組む前に、検証可能な数学的正確性を最優先します。正確さがない限り、性能は意味がありません。
2. 分離による診断の守備範囲
ホスト側の管理とデバイス側の実行の間に厳格な分離を強制し、グローバル状態や副作用を最小限に抑えることで、非決定論的な並列バグを再現可能な論理ユニットに変換します。
3. メモリ/並列処理への宿命論
私たちは メモリ破損と競合状態 GPU性能の主な「捕食者」として受け入れます。 HIPは主要な低レベルプログラミングインターフェースですそのため、信条は新しいカーネルの最初の基準として、慎重な同期と明示的なメモリ所有権を使用することを定めています。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
According to the Creed, what is a statistically inconsistent 'fast' result considered?
An acceptable trade-off for real-time systems.
A failure.
A 'heuristic' optimization.
A driver-level anomaly.
✅ Correct!
Correctness is the foundation; a fast but wrong answer is useless in scientific and production computing.❌ Incorrect
The creed explicitly states that speed without verifiable correctness is a failure.QUESTION 2
Why is 'Isolation' emphasized in the GPU development workflow?
To prevent the GPU from accessing host memory.
To reduce the electricity consumption of the ROCm stack.
To transform non-deterministic concurrency bugs into reproducible logical units.
To hide kernel source code from other developers.
✅ Correct!
Isolation allows you to debug specific units without the noise of global state or asynchronous race conditions.❌ Incorrect
Isolation is a diagnostic strategy to make bugs reproducible.QUESTION 3
In the 'Hierarchy of Needs' for GPU development, what forms the wide base?
Peak TFLOPS Tuning.
Functional Correctness (CPU Parity).
Shared Memory Optimization.
Inline Assembly.
✅ Correct!
CPU parity ensures the mathematical logic is sound before GPU-specific complexities are added.❌ Incorrect
Check the pyramid visual: Functional Correctness is the widest, most critical layer.QUESTION 4
What does 'Memory/Concurrency Fatalism' imply for a developer?
Assuming that memory will never fail.
Accepting that race conditions are the primary predators of performance.
Ignoring error codes from hipMalloc.
Assuming the compiler handles all synchronization.
✅ Correct!
Fatalism here means recognizing the inherent dangers of parallel memory access and planning for them from the start.❌ Incorrect
Fatalism means assuming these errors WILL happen unless specifically prevented.QUESTION 5
What is the recommended first step when implementing a complex kernel like an FFT?
Optimize shared memory usage immediately.
Use inline PTX assembly for speed.
Implement a strictly isolated version using global memory and explicit synchronization.
Disable all error checking to measure raw latency.
✅ Correct!
Verified global memory logic serves as the 'Gold Standard' before introducing complex shared memory tiling.❌ Incorrect
Jumping to shared memory shuffles before verifying the logic violates the Creed's correctness-first rule.Case Study: The 'Fast but Wrong' Wavefront
Debugging a 3D Stencil Kernel
A developer migrates a 3D Wavefront Reconstruction kernel to ROCm. To maximize speed, they use volatile shared memory and skip hipDeviceSynchronize() calls. The output is 100x faster than the CPU but 2% of the values are slightly off-target during high-load production runs.
Q
Based on the GPU Developer's Creed, what is the immediate priority for this developer?
Solution:
The priority is Functional Correctness. The developer must revert the optimizations (shared memory/async) and implement a strictly isolated version using global memory and explicit synchronization to find the 'Golden Model' discrepancy.
The priority is Functional Correctness. The developer must revert the optimizations (shared memory/async) and implement a strictly isolated version using global memory and explicit synchronization to find the 'Golden Model' discrepancy.
Q
Which layer of the Hierarchy of Needs did the developer skip?
Solution:
The developer skipped the base layer (Functional Correctness) and the middle layer (Isolation & Safety) to jump directly to the narrow tip (Performance Tuning).
The developer skipped the base layer (Functional Correctness) and the middle layer (Isolation & Safety) to jump directly to the narrow tip (Performance Tuning).
Q
How does 'Isolation' help solve the 2% error rate in this scenario?
Solution:
By isolating the kernel and comparing it bit-for-bit against a CPU reference, the developer can determine if the error is a logical math flaw or a non-deterministic race condition caused by shared memory concurrency.
By isolating the kernel and comparing it bit-for-bit against a CPU reference, the developer can determine if the error is a logical math flaw or a non-deterministic race condition caused by shared memory concurrency.